Pratical session on Image Retrieval

In this practical session, we will explore how to perform typical tasks associated with image retrieval. Students will be able to download this IPython/Jupyter notebook after the class in order to perform the experiments also at home.

Link to the slides: PDF

Step 1: Create a new copy of this notebook

In order to follow this tutorial, please create a new copy of this notebook and name your copy using your own name. Do not run this notebook directly, as it is read-only for students and any changes will not be able to be saved. To start following this tutorial:

  1. Click File -> Make a Copy
  2. Wait for the new tab to open
  3. Click on the name of the notebook (probably Tutorial-CopyN) and rename it to 'Tutorial-yourname.lastname'.
  4. Close the previous tab containing this notebook (in order to avoid any mistakes when following the tutorial)
  5. Resume reading this tutorial from this point forward

Step 2: Basics for executing a Jupyter Notebook:

In order to run a cell, select a cell and press 'shift + enter'

Step 3: Write us back the answers of the questions below

During the execution of this notebook, you will find some questions that need to be answered. Please write your answers in a separate text file and send us by e-mail at rafael.sampaio-de-rezende@naverlabs.com If you work together with other people during the practical session, you can send a single answer file for two or three people.

Preparatives

We start by importing the necessary modules and fixing a random seed. Please select the cell below and press 'shift+enter':

In [1]:
import numpy as np
from numpy.linalg import norm
import torch
from torch import nn
import json
import pdb
import sys
import os.path as osp
import pandas as pd
from PIL import Image
import sys   
import warnings

from datasets import create
from archs import *
from utils.test import extract_query
from utils.tsne import do_tsne

np.random.seed(0)

print('Ready!')
Ready!

Now, let's start by instantiating the Oxford dataset, that we will use in all following experiments.

In [2]:
# create Oxford 5k database
dataset = create('Oxford')

We can now query for some aspects of this dataset, such as the number of images, number of classes, the name of the different classes, and the class label for each of the images in the dataset:

In [3]:
print('Dataset: ' + dataset.dataset_name)
print()

labels = dataset.get_label_vector()
classes = dataset.get_label_names()

print('Number of images:  ' + str(labels.shape[0]))
print('Number of classes: ' + str(classes.shape[0]))
print()
print('Class names: ' + str(classes))
Dataset: Oxford

Number of images:  5063
Number of classes: 11

Class names: ['all_souls' 'ashmolean' 'balliol' 'bodleian' 'christ_church' 'cornmarket'
 'hertford' 'keble' 'magdalen' 'pitt_rivers' 'radcliffe_camera']

Now, let's load a list of models we can use in this tutorial:

In [4]:
# load the dictionary of the available models and features
with open('data/models.json', 'r') as fp:
    models_dict = json.load(fp)

pd.DataFrame(models_dict).T # show the loaded models onscreen
Out[4]:
dataset queries training weights
alexnet-cls-imagenet-fc7 data/features/alexnet-cls-imagenet-fc7_ox.npy data/features/alexnet-cls-imagenet-fc7_oxq.npy NaN NaN
alexnet-cls-lm-fc7 data/features/alexnet-cls-lm-fc7_ox.npy data/features/alexnet-cls-lm-fc7_oxq.npy NaN NaN
alexnet-cls-lm-gem data/features/alexnet-cls-lm_ox.npy data/features/alexnet-cls-lm_oxq.npy NaN NaN
resnet18-cls-imagenet-gem data/features/resnet18-cls-imagenet_ox.npy data/features/resnet18-cls-imagenet_oxq.npy NaN NaN
resnet18-cls-lm-gem data/features/resnet18-cls-lm_ox.npy data/features/resnet18-cls-lm_oxq.npy NaN NaN
resnet18-cls-imagenet-gem-pcaw data/features/resnet18-cls-imagenet-pca_ox.npy data/features/resnet18-cls-imagenet-pca_oxq.npy NaN NaN
resnet18-cls-lm-gem-pcaw data/features/resnet18-cls-lm-pca_ox.npy data/features/resnet18-cls-lm-pca_oxq.npy NaN NaN
resnet18-rnk-lm-gem data/features/resnet18-rnk-lm_ox.npy data/features/resnet18-rnk-lm_oxq.npy NaN NaN
resnet18-rnk-lm-gem-da data/features/resnet18-rnk-lm-da_ox.npy data/features/resnet18-rnk-lm-da_oxq.npy NaN data/models/resnet18-rnk-lm-da.pt
resnet18-rnk-lm-gem-da-mr data/features/resnet18-rnk-lm-da_mr_ox.npy data/features/resnet18-rnk-lm-da_mr_oxq.npy NaN data/models/resnet18-rnk-lm-da.pt
resnet50-cls-imagenet-gem data/features/resnet50-cls-imagenet_ox.npy data/features/resnet50-cls-imagenet_oxq.npy NaN data/models/resnet50-cls-imagenet.pt
resnet50-cls-lm-gem data/features/resnet50-cls-lm_ox.npy data/features/resnet50-cls-lm_oxq.npy NaN NaN
resnet50-cls-imagenet-gem-pcaw data/features/resnet50-cls-imagenet-pca_ox.npy data/features/resnet50-cls-imagenet-pca_oxq.npy NaN NaN
resnet50-cls-lm-gem-pcaw data/features/resnet50-cls-lm-pca_ox.npy data/features/resnet50-cls-lm-pca_oxq.npy NaN NaN
resnet50-rnk-lm-gem data/features/resnet50-rnk-lm_ox.npy data/features/resnet50-rnk-lm_oxq.npy NaN data/models/resnet50-rnk-lm.pt
resnet50-rnk-lm-gem-da data/features/resnet50-rnk-lm-da_ox.npy data/features/resnet50-rnk-lm-da_oxq.npy data/features/resnet50-rnk-lm-da_lmforPQ.npy data/models/resnet50-rnk-lm-da.pt
resnet50-rnk-lm-gem-da-mr data/features/resnet50-rnk-lm-da_mr_ox.npy data/features/resnet50-rnk-lm-da_mr_oxq.npy NaN data/models/resnet50-rnk-lm-da.pt

Part 1: Training

In this first part of the tutorial, we will study how different changes in the training pipeline (e.g. choice of model, pooling, and post-processing options) can change the quality of results we obtain.

a) Creating a network with the AlexNet architecture

As a first step, we will be creating a neural network implementing the AlexNet architecture to use in our experiments.

In [5]:
# instantate the model for the first experiment
model_1a = alexnet_imagenet()

# show the network details
print(model_1a)
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Now, we could use this model to extract features for all images in our dataset. In order to make this faster, we have already precomputed those features and stored them in the disk.

In order to load the features computed by this model from the disk, run the cell below:

In [6]:
dfeats = np.load(models_dict['alexnet-cls-imagenet-fc7']['dataset'])
In [7]:
pd.DataFrame(dfeats)
Out[7]:
0 1 2 3 4 5 6 7 8 9 ... 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095
0 -0.035850 -0.027481 -0.002729 -0.001876 0.000510 -0.026890 -0.031041 0.008593 -0.036306 -0.011108 ... -0.000412 -0.031039 -0.011067 -0.017820 -0.013743 0.002906 -0.013534 -0.010850 -0.007984 -0.009079
1 -0.019589 -0.024991 0.003753 -0.016643 -0.001419 -0.025324 -0.030376 -0.015020 -0.033136 -0.008341 ... -0.011891 -0.025685 -0.012602 -0.019166 -0.008587 -0.010230 -0.015686 -0.003145 -0.027240 -0.005164
2 -0.031182 -0.015624 0.003333 -0.010716 -0.000947 -0.037874 -0.024973 -0.022489 -0.024565 -0.005231 ... -0.004073 -0.022185 -0.011347 -0.011516 -0.016464 -0.009736 -0.011495 -0.011072 -0.034790 -0.005476
3 -0.018782 -0.022565 0.002240 -0.010076 -0.001437 -0.020373 -0.020782 -0.029803 -0.025968 -0.005636 ... -0.002036 -0.015342 -0.007680 -0.017931 -0.013359 -0.027249 -0.012993 0.000823 -0.041952 0.002931
4 -0.016079 -0.034830 -0.011318 -0.011594 -0.001433 -0.015717 -0.019140 -0.014942 -0.027230 -0.001502 ... -0.008466 -0.026986 -0.014449 -0.024464 -0.024707 -0.029167 -0.015347 -0.011359 -0.034020 -0.010275
5 -0.027226 -0.019598 0.002683 -0.018460 -0.002139 -0.034594 -0.020992 -0.021336 -0.028204 -0.004704 ... -0.004012 -0.021141 -0.012326 -0.013401 -0.012377 -0.015479 -0.010811 -0.012446 -0.042647 -0.002835
6 -0.020550 -0.007709 -0.000061 -0.010187 -0.000725 -0.028299 -0.022015 -0.020881 -0.003965 0.002100 ... -0.007808 -0.027547 -0.007727 -0.017005 -0.013676 -0.014634 -0.014620 -0.006139 -0.039100 -0.000403
7 -0.025968 -0.023937 0.003283 -0.019003 -0.001922 -0.033658 -0.013863 -0.016475 -0.026929 -0.005424 ... -0.005924 -0.025334 -0.012016 -0.018541 -0.009853 -0.015122 -0.012085 -0.008246 -0.040714 -0.006721
8 -0.028738 -0.019763 0.004495 -0.010969 -0.001037 -0.035472 -0.024952 -0.001374 -0.019975 -0.005449 ... -0.004924 -0.013961 -0.010573 -0.020533 -0.017181 -0.007581 -0.009856 -0.000078 -0.032787 0.006912
9 -0.038494 0.011658 -0.000689 -0.011762 0.001312 -0.029404 -0.032518 -0.013561 0.012424 0.003670 ... -0.016833 -0.005571 -0.015611 -0.017843 -0.023685 -0.013018 -0.014558 -0.006085 0.001898 0.006085
10 -0.027081 -0.007277 -0.000806 -0.006623 -0.002354 -0.024626 -0.031134 -0.012121 -0.008267 0.002884 ... -0.004200 -0.019656 -0.007452 -0.016826 -0.017452 -0.009015 -0.010230 -0.006387 -0.020137 -0.006993
11 -0.021348 -0.028567 0.004167 -0.010147 -0.000411 -0.031299 -0.029296 -0.018611 -0.025041 -0.006971 ... -0.005016 -0.022236 -0.011324 -0.013937 -0.011272 -0.017679 -0.014061 -0.007294 -0.038576 0.001956
12 -0.017281 -0.025579 0.010256 0.000060 -0.002708 -0.026455 -0.035257 -0.019940 -0.031574 -0.005245 ... -0.010997 -0.022230 -0.009144 -0.017470 -0.006658 -0.024912 -0.014827 -0.005194 -0.021209 -0.008566
13 -0.027684 -0.027197 0.001612 -0.009720 -0.001331 -0.031998 -0.027507 -0.018030 -0.026168 -0.003904 ... -0.001202 -0.026214 -0.007833 -0.014745 -0.016136 -0.015063 -0.006135 -0.004892 -0.032761 -0.002260
14 -0.025850 -0.011257 -0.003838 -0.021394 -0.001046 -0.031310 -0.027649 -0.003997 -0.031237 0.001782 ... 0.000880 -0.022625 0.000586 -0.015039 -0.012922 -0.006825 -0.001636 0.004990 -0.014997 -0.010761
15 -0.024928 0.005489 -0.006138 -0.019530 0.000773 -0.023293 -0.017519 0.005392 -0.015649 0.005732 ... -0.015822 -0.018471 -0.001913 -0.003494 -0.025474 -0.002241 -0.017720 0.000635 -0.007782 -0.013381
16 -0.029517 -0.016332 -0.004522 -0.016007 0.000871 -0.029457 -0.022451 -0.009513 -0.027515 0.004762 ... -0.007752 -0.021261 -0.004416 -0.020123 -0.022609 -0.008232 -0.019517 -0.006908 -0.018141 -0.009365
17 -0.024838 -0.037145 0.000010 0.000810 -0.001789 -0.018664 -0.027436 -0.011018 -0.024205 -0.003431 ... -0.003554 -0.031075 -0.014735 -0.017898 -0.020833 -0.018709 -0.018186 -0.005131 -0.026268 -0.004622
18 -0.015125 -0.017970 -0.002840 -0.016026 -0.000481 -0.013773 -0.034277 -0.006037 -0.008685 -0.012368 ... 0.001357 -0.017796 -0.014535 -0.007374 -0.006194 -0.022500 -0.014345 -0.010090 -0.022787 -0.007547
19 -0.012851 -0.023673 0.006758 -0.026409 -0.003080 -0.022918 -0.009414 -0.001949 -0.031154 -0.010514 ... -0.006074 -0.020318 -0.007159 -0.023952 -0.013970 -0.005516 -0.017243 -0.003798 -0.038834 -0.007877
20 -0.021437 -0.017208 0.006442 -0.009458 -0.002067 -0.024592 -0.027187 -0.017311 -0.029715 -0.004254 ... -0.010892 -0.018579 -0.007896 -0.015672 -0.013698 -0.011299 -0.011999 -0.009641 -0.032878 -0.004807
21 -0.026983 -0.025474 0.001393 -0.007402 -0.000908 -0.024785 -0.034075 -0.014797 -0.025252 -0.007689 ... -0.003502 -0.026492 -0.009455 -0.018022 -0.006087 -0.017398 -0.013307 -0.001966 -0.036382 0.002497
22 -0.019076 -0.022870 0.007427 -0.012212 -0.002586 -0.024579 -0.018077 -0.019769 -0.024745 -0.006681 ... -0.005815 -0.021232 -0.012198 -0.018913 -0.009015 -0.018720 -0.015896 -0.004090 -0.033730 0.001615
23 -0.027354 -0.035555 -0.003456 -0.003912 -0.001060 -0.020576 -0.029377 -0.005573 -0.018704 -0.004495 ... -0.011667 -0.013758 -0.008614 -0.027858 -0.007884 -0.022516 -0.014263 -0.007572 -0.023820 0.004032
24 -0.005488 -0.008238 -0.001457 -0.010426 -0.000172 -0.024302 -0.038577 -0.010533 -0.020359 0.003372 ... -0.001536 -0.001783 -0.002038 -0.015738 -0.006963 -0.019664 -0.010886 0.000494 -0.031659 -0.003276
25 -0.023403 -0.024348 0.004184 -0.010916 -0.003850 -0.020622 -0.021641 -0.021715 -0.026749 -0.007291 ... -0.001867 -0.019853 -0.011301 -0.016616 -0.006979 -0.016187 -0.015919 -0.011331 -0.040050 -0.004650
26 -0.019293 -0.033486 -0.006621 0.013770 0.000776 -0.027782 -0.032088 -0.009887 -0.024583 -0.003984 ... -0.016243 -0.025675 -0.009360 -0.030824 -0.019758 -0.033123 -0.016040 0.006885 -0.019548 -0.000967
27 -0.034574 -0.030143 -0.000004 -0.009200 0.000717 -0.023255 -0.030869 -0.007275 -0.028785 -0.005148 ... -0.005506 -0.015342 -0.014663 -0.022323 -0.020380 -0.023321 -0.013918 -0.000593 -0.029029 0.010854
28 -0.018187 -0.021934 0.008050 0.003586 -0.002963 -0.024223 -0.034990 -0.017086 -0.024245 -0.007549 ... -0.011252 -0.029786 -0.011785 -0.017243 -0.007982 -0.024860 -0.019540 0.001454 -0.015303 -0.007190
29 -0.033420 -0.007771 0.001079 -0.013624 -0.000201 -0.019443 -0.026390 -0.009942 -0.009914 -0.002204 ... -0.002136 -0.018941 -0.020526 -0.021125 -0.020331 -0.004055 0.003452 0.004322 -0.013003 -0.008267
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5033 -0.000347 -0.023440 0.001632 -0.006777 -0.002510 -0.008534 -0.018901 -0.005633 -0.009391 0.000800 ... -0.013342 -0.027927 -0.006946 -0.026756 -0.004879 -0.012207 -0.006081 -0.007615 -0.017928 -0.002302
5034 -0.025202 -0.017172 -0.009534 -0.014765 -0.002114 -0.004067 -0.035919 -0.008954 -0.025238 -0.002243 ... -0.024637 0.004877 -0.018746 -0.011383 -0.008009 -0.013655 -0.012078 -0.019453 0.000395 -0.007588
5035 -0.005361 -0.022537 -0.001587 -0.003194 -0.003594 -0.006201 -0.018159 -0.005939 -0.011017 0.001679 ... -0.015504 -0.016143 -0.004584 -0.025722 0.000164 -0.008541 -0.009205 -0.010478 -0.034952 0.002825
5036 -0.020266 -0.016555 -0.008265 -0.016197 -0.002761 -0.013680 -0.028658 -0.000870 -0.020343 0.001050 ... -0.024749 0.003465 -0.014469 -0.021091 -0.004518 -0.010140 -0.023971 -0.014724 -0.011519 0.007968
5037 -0.020842 -0.018926 -0.007261 -0.019875 -0.002341 -0.002577 -0.026522 -0.009629 -0.015573 0.000088 ... -0.021622 0.006516 -0.012704 -0.016555 -0.009661 -0.009772 -0.017503 -0.017996 -0.011112 -0.007588
5038 -0.010701 -0.024621 -0.008893 -0.008442 -0.002328 -0.013485 -0.034279 -0.008667 -0.005068 -0.004631 ... -0.014453 -0.003912 -0.011439 -0.027973 -0.001685 -0.016979 -0.017930 -0.008529 -0.028288 0.006931
5039 -0.017619 -0.013999 -0.008077 -0.012367 -0.003130 -0.000583 -0.020810 -0.005786 -0.022166 0.003341 ... -0.024194 0.004501 -0.013348 -0.018696 -0.001085 0.002192 -0.020694 -0.014153 -0.012470 0.001466
5040 -0.016383 -0.009473 -0.002218 0.004001 -0.003365 -0.007883 -0.020073 -0.002648 -0.013074 0.003931 ... -0.019501 -0.007133 -0.009600 -0.022009 0.006583 0.004134 -0.014014 -0.005786 -0.025588 0.005945
5041 -0.016378 -0.015357 -0.007425 -0.001752 -0.004289 0.003992 -0.028638 -0.012099 -0.006005 -0.001575 ... -0.019578 -0.002389 -0.009048 -0.012606 -0.003342 -0.008900 -0.018356 -0.020209 -0.026405 -0.000068
5042 -0.006824 -0.024732 0.002949 -0.022483 -0.003118 -0.010542 -0.017331 -0.009628 -0.017250 -0.000993 ... -0.018800 -0.023388 -0.012136 -0.029079 -0.004281 -0.005168 -0.009608 -0.010838 -0.015683 -0.007684
5043 -0.018325 -0.020116 -0.011342 -0.003993 -0.004546 -0.003612 -0.022753 -0.005329 -0.020328 0.002754 ... -0.023329 0.001346 -0.013048 -0.019631 -0.004764 -0.002181 -0.014738 -0.021274 -0.016126 -0.001293
5044 -0.007036 -0.024572 -0.002254 -0.009914 -0.001902 -0.015593 -0.015422 -0.003129 -0.019139 -0.003510 ... -0.009001 -0.012072 -0.001284 -0.028293 -0.005345 -0.016693 -0.023231 -0.006365 -0.031506 -0.013744
5045 -0.011654 -0.020922 -0.001366 -0.006736 -0.003982 -0.005101 -0.033442 -0.008649 -0.004348 -0.003097 ... -0.014633 -0.006092 -0.014136 -0.024329 -0.004336 -0.011019 -0.022371 -0.013520 -0.010478 -0.001287
5046 -0.022540 -0.028606 0.002502 -0.012492 0.001526 -0.020187 -0.019245 -0.004350 0.000012 -0.008098 ... 0.012325 -0.015279 -0.012979 -0.012820 -0.027227 -0.022258 -0.004827 -0.005391 -0.026472 -0.012411
5047 -0.026575 -0.023154 -0.000724 -0.029461 -0.003927 -0.020665 -0.030278 -0.008159 -0.033438 -0.009544 ... -0.011195 -0.015280 -0.013788 -0.019564 -0.019466 -0.009008 -0.024397 -0.012796 -0.034644 -0.011108
5048 -0.024395 -0.027769 -0.008896 -0.018073 0.000822 -0.012914 -0.030332 -0.007687 -0.021201 -0.000398 ... -0.000164 -0.024234 -0.005436 -0.020709 0.000338 -0.016295 -0.005969 0.000478 -0.041846 -0.006876
5049 -0.012741 0.003680 -0.005888 -0.032018 0.001434 -0.027847 -0.021405 0.001170 -0.016321 0.004913 ... -0.013130 -0.017558 -0.004320 -0.010863 -0.004026 -0.020087 -0.014901 -0.012173 -0.018322 -0.008402
5050 -0.016582 -0.020231 -0.011685 -0.010779 -0.001648 -0.017689 -0.039961 -0.004735 -0.015806 0.002266 ... -0.014672 -0.014434 -0.004289 -0.014115 -0.002484 -0.020811 -0.026684 -0.010740 -0.024305 0.003634
5051 -0.012349 -0.020542 0.009845 -0.018341 -0.004051 -0.026751 -0.012182 -0.019309 -0.017127 -0.007947 ... -0.003054 -0.019764 -0.012854 -0.017580 -0.006597 -0.014379 -0.013561 -0.022182 -0.044274 -0.001069
5052 -0.018762 -0.023350 -0.002286 -0.023453 -0.002324 -0.023141 -0.009755 -0.017285 -0.013223 -0.010814 ... -0.007429 -0.018966 -0.013135 -0.002689 0.000219 -0.022210 -0.017296 -0.021047 -0.030458 -0.005790
5053 -0.016194 -0.020058 0.001327 -0.018487 -0.003637 -0.021920 -0.019478 -0.016771 -0.017647 -0.006028 ... -0.005045 -0.020295 -0.011635 -0.017347 -0.000270 -0.016058 -0.009880 -0.010650 -0.041136 0.006200
5054 -0.021199 -0.028070 -0.016551 -0.014916 -0.003552 -0.017378 -0.027365 0.002366 -0.013555 -0.010481 ... -0.008137 -0.014298 -0.014035 -0.009887 -0.016047 -0.019440 -0.008347 -0.009308 -0.030365 -0.012091
5055 -0.019173 -0.032468 0.001422 -0.008005 -0.002588 -0.029478 -0.024252 -0.008996 -0.030966 -0.007655 ... -0.000796 -0.025821 -0.008951 -0.027916 -0.015959 -0.016102 -0.008641 0.009708 -0.037998 0.001281
5056 -0.017126 -0.025907 -0.016363 0.004764 -0.002774 -0.005283 -0.030929 0.006493 -0.031301 -0.008680 ... -0.017370 -0.004735 -0.001412 -0.019755 0.005895 -0.012421 -0.007542 -0.002546 -0.021843 -0.014134
5057 -0.014173 -0.006387 -0.004921 -0.024131 -0.002004 -0.011987 -0.017906 -0.012592 -0.015745 -0.008782 ... -0.013968 -0.022908 0.005180 -0.027156 -0.023694 -0.016762 -0.011498 0.001105 -0.027240 -0.025659
5058 -0.019288 -0.009921 -0.001847 -0.020739 0.000231 -0.018879 -0.019011 -0.002928 -0.033253 -0.013138 ... -0.002815 -0.035159 -0.004014 -0.022392 -0.018345 -0.018262 -0.007139 0.004956 -0.027133 -0.027942
5059 -0.019033 -0.017975 0.003419 -0.015668 -0.001609 -0.034946 -0.021014 -0.021058 -0.009748 -0.006012 ... 0.002406 -0.028475 -0.011852 -0.015391 -0.012384 -0.028722 -0.003748 -0.004247 -0.030851 -0.007852
5060 -0.021733 -0.025472 -0.013546 -0.009081 -0.004181 -0.026810 -0.037382 -0.014160 -0.014759 -0.008995 ... -0.016115 -0.012925 -0.009256 -0.023707 -0.012554 -0.017123 -0.012856 -0.015861 -0.020736 -0.004432
5061 -0.017671 -0.036637 -0.015029 -0.015845 -0.004091 -0.010302 -0.014041 -0.012029 -0.012269 -0.011232 ... 0.008133 -0.013509 -0.008832 -0.004320 -0.010680 -0.018411 -0.003398 -0.011137 -0.038628 -0.017032
5062 -0.000115 -0.028487 -0.008242 -0.031964 -0.001042 -0.018175 -0.009899 -0.012353 -0.018911 -0.003586 ... -0.013459 -0.008254 -0.000724 -0.019110 0.012440 -0.020745 -0.009833 -0.003253 -0.021667 -0.019985

5063 rows × 4096 columns

Question 1: What does each row of the matrix feats represent?

Question 2: Where does the dimension of these lines comes from and how do we extract these features?

Hint: if you do not know the answers for the questions above, try running the following command:

model_1a_test = alexnet_imagenet_fc7(); print(model_1a_test)

Now, assuming that we have already used our network to extract features from all images in the dataset and stored them in the matrix dfeats (as done above), we will retrieve the top-15 images that are most similar to a query image. In our example, we will use the following image as a query:

In [8]:
q_idx = 11 # feel free to switch to another number afterwards, but test first with 11
In [9]:
# visualize top results for a given query
dataset.vis_top(dfeats, q_idx, ap_flag=True)
AP=12.73

To the right of the query image, we plot the best retrieval results, with decreasing similarity from left to right. Images in green frames are true matches, red frames are false matches, and gray frames are so-called 'junk' matches (images from the same landmark, but from angles too different or at wrong spots). Junk matches are ignored during the calculation of the AP.

Now we will use the t-SNE algorithm to cluster images together according to feature similarity:

In [10]:
do_tsne(dfeats, labels, classes, sec='1a')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.034s...
[t-SNE] Computed neighbors for 5063 samples in 9.650s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.162765
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.701149
[t-SNE] KL divergence after 300 iterations: 2.383235
t-SNE done! Time elapsed: 21.600 seconds
2-dimensional t-sne plot.

Question 3: What can be observe from the t-SNE visualization? Which classes 'cluster' well? Which do not?

b) Finetuning the created network on the Landmarks dataset

Now, we will see what happens when we fine-tune our off-the-shelf ImageNet network in the Landmarks dataset and then repeat the process above.

We can quickly compare some exemples of images of both training datasets.

In [11]:
Image.open('figs/imagenet_ex.png')
Out[11]:
In [12]:
Image.open('figs/lm_ex.png')
Out[12]:

Question 4: Should we get better results? What should change? Why?

In [13]:
model_1b = alexnet_lm() # instantate the model that has been fine-tuned in landmarks

print(model_1b) # show the network details
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=586, bias=True)
  )
)

Compare with the model we had before:

In [14]:
print(model_1a)
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Question 5: Why do we change the last layer of the AlexNet architecture?

Question 6: How do we initialize the layers of model_1b for finetuning?

Let's now repeat the same process we had done before, but now using image features that have been extracted using the fine-tuned network.

In [15]:
dfeats = np.load(models_dict['alexnet-cls-lm-fc7']['dataset'])
pd.DataFrame(dfeats)
Out[15]:
0 1 2 3 4 5 6 7 8 9 ... 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095
0 -0.028551 -0.026623 -0.010203 -0.010896 0.001626 -0.025944 -0.020717 0.011002 -0.045042 -0.010722 ... 0.003847 -0.028382 -0.011679 -0.015839 -0.017878 0.002055 -0.012127 -0.008400 0.000425 -0.017276
1 -0.018635 -0.012787 -0.004841 -0.023212 -0.001751 -0.024415 -0.032008 -0.008180 -0.030614 -0.007919 ... 0.001333 -0.027415 -0.014296 -0.016405 -0.014020 -0.015200 -0.015713 -0.003962 -0.011068 -0.008547
2 -0.032243 -0.020034 0.004824 -0.005007 0.001027 -0.021642 -0.026012 -0.015621 -0.032444 -0.006335 ... -0.001029 -0.019675 -0.012103 -0.015862 -0.018569 -0.009880 -0.014367 -0.003542 -0.029109 -0.000369
3 -0.013617 -0.016270 0.001534 -0.014598 -0.001201 -0.019049 -0.020825 -0.022464 -0.020278 -0.002925 ... 0.001838 -0.017435 -0.008794 -0.012514 -0.020492 -0.019942 -0.014706 0.003281 -0.045019 0.008413
4 -0.009342 -0.030747 -0.007650 -0.003674 -0.002459 -0.010953 -0.021755 -0.010861 -0.031161 0.001171 ... -0.010367 -0.023052 -0.015727 -0.018414 -0.020291 -0.031972 -0.019041 -0.008903 -0.031688 -0.010810
5 -0.029534 -0.016715 0.000056 -0.010092 0.000297 -0.022919 -0.028175 -0.015088 -0.038621 -0.005485 ... -0.006256 -0.016297 -0.013381 -0.013261 -0.011519 -0.013824 -0.010889 -0.005092 -0.038197 -0.002112
6 -0.017982 -0.005457 0.001470 -0.010093 0.000075 -0.019405 -0.028549 -0.014583 -0.007152 0.008021 ... -0.014570 -0.027468 -0.014251 -0.020437 -0.016302 -0.022584 -0.012315 -0.000965 -0.035350 0.007543
7 -0.021748 -0.017116 0.001957 -0.018670 -0.002048 -0.023776 -0.019693 -0.019385 -0.020138 0.000627 ... -0.011379 -0.028215 -0.016770 -0.019200 -0.013233 -0.017148 -0.015777 -0.008602 -0.034981 0.004236
8 -0.024694 -0.014167 0.006056 -0.014417 -0.000376 -0.015628 -0.021058 0.000901 -0.014985 -0.000961 ... -0.011121 -0.013275 -0.016904 -0.017208 -0.029457 -0.009145 -0.017290 0.009647 -0.025372 0.015104
9 -0.022311 0.019810 -0.007204 -0.024596 0.004135 -0.025079 -0.024800 -0.019038 0.002539 0.002821 ... -0.015700 -0.001785 -0.012003 -0.014416 -0.026256 -0.013071 -0.020733 -0.003333 -0.004687 0.004002
10 -0.019029 0.001114 -0.001359 -0.010971 -0.002517 -0.015807 -0.026431 -0.007757 -0.006357 0.003965 ... -0.001019 -0.019325 -0.006362 -0.010840 -0.018467 0.000757 -0.008865 -0.006804 -0.019182 -0.002699
11 -0.025168 -0.026132 0.004889 -0.008259 0.000814 -0.022118 -0.027081 -0.019286 -0.031497 -0.008915 ... -0.003989 -0.017037 -0.011590 -0.012645 -0.013149 -0.011585 -0.011122 0.001606 -0.031973 0.007855
12 -0.016623 -0.018003 0.002331 -0.009787 -0.004694 -0.024236 -0.034037 -0.013703 -0.029330 -0.004208 ... -0.009838 -0.021640 -0.013368 -0.014832 -0.015509 -0.021305 -0.019083 -0.011932 -0.010327 -0.011317
13 -0.029353 -0.032940 0.007514 -0.004610 0.000361 -0.018311 -0.026966 -0.013674 -0.031195 -0.009588 ... 0.000693 -0.023021 -0.011087 -0.018054 -0.015837 -0.007951 -0.006825 0.000475 -0.026045 -0.001976
14 -0.026197 -0.002524 -0.005353 -0.020280 0.000749 -0.028682 -0.032073 -0.000353 -0.029940 0.000056 ... -0.001363 -0.023103 -0.001140 -0.017572 -0.013519 -0.005257 0.001151 0.008127 -0.014630 -0.006125
15 -0.024481 0.012392 -0.008761 -0.025426 0.001896 -0.023800 -0.022350 0.008212 -0.006224 0.006695 ... -0.021701 -0.020794 -0.004683 -0.006079 -0.026301 -0.002026 -0.013750 -0.004168 -0.004243 -0.002119
16 -0.024368 -0.005999 -0.010947 -0.022314 0.002359 -0.029469 -0.020003 -0.003388 -0.024448 0.005132 ... -0.011303 -0.019375 -0.003796 -0.019948 -0.023761 0.000923 -0.017421 -0.010388 -0.006391 -0.009313
17 -0.022026 -0.031126 0.003294 -0.000261 -0.001995 -0.011783 -0.027193 -0.002053 -0.027752 -0.002420 ... -0.002276 -0.028174 -0.018417 -0.013027 -0.025181 -0.013538 -0.018883 0.001176 -0.021135 -0.001294
18 -0.007702 -0.009507 -0.005534 -0.025402 -0.000470 -0.004896 -0.026954 -0.004188 -0.000152 -0.013251 ... -0.001956 -0.021123 -0.013597 -0.005080 -0.013934 -0.019420 -0.009817 -0.006328 -0.012930 -0.005676
19 -0.003884 -0.023615 0.002270 -0.030834 -0.001828 -0.017886 -0.007803 -0.001403 -0.021545 -0.011980 ... -0.000466 -0.027074 -0.008360 -0.026614 -0.010088 -0.012548 -0.014887 -0.005277 -0.035450 0.000222
20 -0.016494 -0.010600 0.002542 -0.013778 -0.001539 -0.023797 -0.026076 -0.006283 -0.027809 -0.003893 ... -0.007151 -0.022339 -0.008701 -0.010697 -0.019777 -0.007409 -0.012172 -0.004435 -0.026842 -0.003616
21 -0.026353 -0.020201 -0.001892 -0.009232 0.001740 -0.018826 -0.031903 -0.001567 -0.035316 -0.006904 ... -0.005773 -0.023747 -0.011750 -0.015716 -0.012946 -0.012802 -0.006819 0.006455 -0.029566 -0.005083
22 -0.016668 -0.017563 0.004384 -0.016031 -0.003232 -0.017060 -0.017542 -0.017808 -0.017806 -0.001405 ... -0.009099 -0.021832 -0.015716 -0.020402 -0.020280 -0.023249 -0.017997 0.002153 -0.028843 0.012784
23 -0.030964 -0.037219 -0.003955 0.002291 -0.000572 -0.023875 -0.033905 -0.006193 -0.020811 -0.003830 ... -0.010303 -0.017239 -0.012055 -0.028273 -0.010938 -0.027183 -0.003085 -0.002383 -0.007402 0.006293
24 0.000821 -0.000129 -0.003788 -0.025275 -0.000904 -0.012620 -0.030948 -0.009335 -0.016613 -0.003147 ... 0.003367 -0.008248 -0.006514 -0.015732 -0.013707 -0.018249 -0.015989 -0.001071 -0.016814 -0.003472
25 -0.018186 -0.016131 0.002393 -0.013781 -0.003109 -0.016193 -0.027394 -0.016651 -0.023209 -0.001491 ... -0.006627 -0.021719 -0.015875 -0.017428 -0.016480 -0.012726 -0.019636 -0.011853 -0.038152 0.000708
26 -0.018845 -0.030311 -0.009118 0.010493 -0.000417 -0.024194 -0.020765 -0.015473 -0.027195 -0.001199 ... -0.017402 -0.022608 -0.008872 -0.026209 -0.024262 -0.032848 -0.009165 0.006407 -0.020251 -0.005034
27 -0.029329 -0.019229 -0.000018 -0.015378 0.001979 -0.026732 -0.029199 -0.009536 -0.022479 0.001072 ... -0.006367 -0.017225 -0.016677 -0.025049 -0.028151 -0.017256 -0.002296 0.003298 -0.022247 0.014804
28 -0.012088 -0.015670 0.003648 0.002827 -0.004373 -0.018010 -0.038437 -0.016898 -0.025711 -0.007648 ... -0.010798 -0.026026 -0.012404 -0.014917 -0.010070 -0.023810 -0.018125 0.000241 -0.007823 -0.009463
29 -0.024306 0.009044 0.001023 -0.024701 0.002631 -0.013689 -0.026019 -0.006568 -0.000480 -0.000821 ... -0.003377 -0.019526 -0.016006 -0.017642 -0.026713 -0.004874 0.000610 0.000489 -0.017628 -0.003426
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5033 0.006420 -0.009578 -0.005831 -0.013147 -0.001402 -0.003528 -0.013629 -0.001294 -0.001539 0.002185 ... -0.014190 -0.023979 -0.005739 -0.023219 -0.011576 -0.007252 -0.003802 -0.016326 -0.003195 -0.010335
5034 -0.017702 -0.003512 -0.014482 -0.017439 -0.002590 -0.007487 -0.032187 -0.001065 -0.020984 -0.003690 ... -0.021651 0.003791 -0.021840 -0.006273 -0.009228 -0.005992 -0.013359 -0.024254 0.017078 -0.008074
5035 -0.006357 -0.014610 -0.005400 -0.004409 -0.003051 -0.000852 -0.020705 -0.006108 -0.012303 0.000914 ... -0.013306 -0.009400 -0.006150 -0.015215 0.002970 0.002417 -0.005234 -0.011817 -0.021143 -0.000435
5036 -0.013132 -0.000167 -0.012220 -0.014691 -0.004443 -0.010845 -0.024557 -0.003833 -0.025383 -0.000249 ... -0.021614 0.007515 -0.013237 -0.008551 -0.006950 0.003528 -0.026941 -0.019162 0.002786 0.005495
5037 -0.012219 -0.001765 -0.010703 -0.016852 -0.001854 0.004288 -0.026589 -0.006043 -0.006486 0.002065 ... -0.020455 0.007734 -0.014495 -0.007006 -0.009188 0.002024 -0.020959 -0.026230 0.001301 -0.003371
5038 -0.007859 -0.014488 -0.013675 -0.010781 -0.004298 -0.010800 -0.031975 0.001828 -0.001763 -0.006598 ... -0.014619 -0.000386 -0.017257 -0.015129 -0.001214 -0.013874 -0.017954 -0.012370 -0.007018 0.003740
5039 -0.017344 -0.005251 -0.011858 -0.014008 -0.002284 0.004483 -0.024964 -0.003019 -0.021594 0.001666 ... -0.022245 0.008928 -0.014098 -0.012326 0.003093 0.013601 -0.019516 -0.017722 -0.001062 -0.002711
5040 -0.011394 -0.007863 -0.010904 -0.002325 -0.002152 0.001368 -0.023616 -0.004785 -0.015303 -0.000598 ... -0.017030 -0.001325 -0.010531 -0.013973 0.008489 0.013469 -0.003738 -0.007958 -0.012742 0.004000
5041 -0.018215 -0.009682 -0.011052 -0.003607 -0.002381 0.008844 -0.028112 -0.002207 -0.006474 -0.004393 ... -0.017739 0.002413 -0.016578 -0.002015 0.000121 -0.002390 -0.015790 -0.020084 -0.015042 -0.005063
5042 -0.010062 -0.014636 -0.002621 -0.022113 -0.002457 -0.006488 -0.020937 -0.005678 -0.016605 0.002556 ... -0.019098 -0.016284 -0.012387 -0.019852 -0.010279 -0.004493 -0.015464 -0.017018 -0.002089 -0.013277
5043 -0.017791 -0.003897 -0.014394 -0.008513 -0.003563 0.002201 -0.023377 -0.000625 -0.022593 0.004198 ... -0.021249 0.006688 -0.013331 -0.010247 0.000963 0.012936 -0.016944 -0.020777 -0.004491 -0.002029
5044 -0.004529 -0.018705 -0.008422 -0.012981 -0.001302 -0.009094 -0.018053 -0.000053 -0.009407 -0.006132 ... -0.015746 -0.012963 -0.005772 -0.019039 -0.006105 -0.018168 -0.020136 -0.006094 -0.017336 -0.012794
5045 -0.005670 -0.008784 -0.006269 -0.009022 -0.004010 0.000542 -0.028960 -0.002236 -0.003281 -0.001485 ... -0.009960 -0.003804 -0.016770 -0.015547 -0.004309 -0.002458 -0.023523 -0.022853 -0.003019 -0.005913
5046 -0.013359 -0.016650 0.001126 -0.020403 0.000636 -0.016620 -0.009679 -0.002943 0.003730 -0.007977 ... 0.014311 -0.025762 -0.010937 -0.011417 -0.028236 -0.025882 -0.002954 0.000886 -0.026229 -0.008011
5047 -0.026598 -0.012581 -0.006012 -0.031201 -0.003350 -0.017720 -0.035028 0.000131 -0.031578 -0.006704 ... -0.005933 -0.016810 -0.017021 -0.018234 -0.022934 -0.012876 -0.024875 -0.011916 -0.025554 -0.009523
5048 -0.015849 -0.020363 -0.015536 -0.030947 0.001526 -0.014698 -0.025941 -0.001981 -0.025574 -0.002711 ... -0.004671 -0.022288 -0.007165 -0.016859 0.001346 -0.011183 0.005878 -0.008446 -0.033325 -0.008214
5049 -0.008075 0.013479 -0.009042 -0.033357 0.004742 -0.023792 -0.014297 0.011373 -0.012250 0.005919 ... -0.012319 -0.011883 -0.003950 -0.009940 -0.010904 -0.016405 -0.012694 -0.018617 -0.019431 -0.009097
5050 -0.010249 -0.016496 -0.018325 -0.013321 -0.002375 -0.015356 -0.036365 0.002534 -0.007947 0.001394 ... -0.015242 -0.011797 -0.013051 -0.006535 -0.002383 -0.016013 -0.019806 -0.015583 -0.011606 0.001955
5051 -0.010399 -0.020552 0.006542 -0.015312 -0.003772 -0.023825 -0.015864 -0.020212 -0.019906 -0.007924 ... 0.003604 -0.018092 -0.011586 -0.015804 -0.008587 -0.022641 -0.008844 -0.027847 -0.040063 -0.003700
5052 -0.016141 -0.014292 -0.004438 -0.024443 0.000635 -0.023823 -0.013773 -0.013266 -0.010187 -0.003639 ... -0.009514 -0.018513 -0.014831 -0.006144 -0.003890 -0.021774 -0.014871 -0.019948 -0.030730 0.006003
5053 -0.008508 -0.009331 -0.001154 -0.021030 -0.002939 -0.026079 -0.025988 -0.009764 -0.016129 -0.002305 ... -0.005719 -0.017598 -0.014405 -0.008024 -0.004864 -0.008965 -0.009902 -0.016587 -0.029152 0.008317
5054 -0.018439 -0.030131 -0.021204 -0.012668 -0.002737 -0.007751 -0.030791 0.005653 -0.004667 -0.011653 ... -0.008168 -0.015769 -0.016353 -0.003115 -0.014644 -0.015300 -0.005123 -0.010602 -0.019844 -0.015408
5055 -0.014956 -0.023755 -0.004367 -0.017992 -0.003297 -0.032099 -0.019246 -0.003738 -0.025705 -0.004469 ... 0.001394 -0.022784 -0.009626 -0.026092 -0.025188 -0.003590 -0.001294 0.017948 -0.025178 -0.001324
5056 -0.008895 -0.020823 -0.017807 -0.000939 -0.004310 -0.004891 -0.026444 0.015489 -0.034796 -0.011039 ... -0.013849 -0.012462 -0.004082 -0.011107 0.009729 -0.011318 0.001022 -0.004201 -0.014824 -0.016547
5057 -0.007354 0.008305 -0.002269 -0.025214 -0.001947 -0.012997 -0.010756 -0.009681 -0.009604 -0.005841 ... -0.015229 -0.022635 -0.001805 -0.020507 -0.028932 -0.012048 -0.015335 -0.005818 -0.025185 -0.024743
5058 -0.005668 -0.000681 -0.003375 -0.028822 -0.001499 -0.021600 -0.007049 0.001536 -0.023913 -0.015559 ... -0.001344 -0.039140 -0.008518 -0.023644 -0.023298 -0.013051 -0.006825 0.000780 -0.026958 -0.021929
5059 -0.019395 -0.014146 0.002322 -0.019709 -0.000981 -0.024933 -0.024210 -0.019986 0.000200 -0.003221 ... 0.005332 -0.032063 -0.013495 -0.017639 -0.018414 -0.032581 -0.010547 -0.010071 -0.027152 -0.005734
5060 -0.016897 -0.017160 -0.020578 -0.012219 -0.003498 -0.027340 -0.034940 -0.007107 -0.015662 -0.008644 ... -0.020079 -0.010580 -0.007265 -0.022040 -0.013817 -0.008921 -0.010515 -0.022058 -0.010114 0.002297
5061 -0.015042 -0.037659 -0.016388 -0.011204 -0.004749 0.002709 -0.015337 -0.008564 -0.019330 -0.013012 ... 0.009836 -0.011291 -0.011175 0.003504 -0.005572 -0.011331 0.005757 -0.014580 -0.030487 -0.016762
5062 0.007865 -0.012663 -0.010656 -0.037929 -0.000854 -0.011453 -0.005602 -0.009234 -0.013390 -0.002606 ... -0.019527 -0.004053 0.000578 -0.015989 0.014485 -0.015641 -0.005312 -0.007819 -0.016039 -0.012209

5063 rows × 4096 columns

Visualize the top-15 most similar images:

In [16]:
dataset.vis_top(dfeats, q_idx, ap_flag=True)
AP=24.67
In [17]:
do_tsne(dfeats, labels, classes, sec='1b')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.031s...
[t-SNE] Computed neighbors for 5063 samples in 9.624s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.188440
[t-SNE] KL divergence after 250 iterations with early exaggeration: 82.426369
[t-SNE] KL divergence after 300 iterations: 2.448307
t-SNE done! Time elapsed: 21.110 seconds
2-dimensional t-sne plot.

Question 6: How does the visualization change after finetuning? What about the top results?

Question 7: Why images need to be resized to 224x224 before they can be fed to AlexNet? How can this affect results?

c) Replacing last max pooling layer with GeM layer

Now, we will replace the last max pooling layer of our network with a GeM layer and see how this affects the results. For this model, we remove all fully connected layers (classifier layers) and replace the last max pooling layer by an aggregation pooling layer (more details about this layer in the next subsection).

In [18]:
model_1c = alexnet_GeM() # instantate the fine-tuned model with a GeM layer instead of max-pooling

print(model_1c) # show the network details. Can you identify what has changed?
AlexNet_RMAC(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): GeneralizedMeanPooling(3, output_size=1)
  )
)

Compare with the model we had before:

In [19]:
print(model_1b) 
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=586, bias=True)
  )
)

We assume again we have used this model to extract features from all images and stored them in the dfeats variable:

In [20]:
dfeats = np.load(models_dict['alexnet-cls-lm-gem']['dataset'])

pd.DataFrame(dfeats)
Out[20]:
0 1 2 3 4 5 6 7 8 9 ... 246 247 248 249 250 251 252 253 254 255
0 0.005151 0.056662 0.052122 0.017439 0.030739 0.032248 0.017193 0.026330 0.030174 0.034045 ... 0.050055 0.066938 0.154829 0.096428 0.021463 0.016210 0.035672 0.019620 0.054269 0.019038
1 0.034120 0.071401 0.078033 0.049910 0.023599 0.033573 0.022876 0.036661 0.014827 0.086269 ... 0.065778 0.036902 0.090593 0.051248 0.063048 0.064353 0.053661 0.027867 0.023930 0.049543
2 0.007011 0.074704 0.164091 0.012346 0.069913 0.047311 0.042899 0.022559 0.012220 0.114409 ... 0.085460 0.111921 0.041570 0.065033 0.066166 0.022868 0.017038 0.024712 0.040535 0.081485
3 0.014066 0.049824 0.098039 0.004307 0.043525 0.058548 0.040201 0.063092 0.005686 0.083354 ... 0.061489 0.075462 0.098459 0.053052 0.017032 0.021953 0.036715 0.061517 0.032584 0.097201
4 0.011313 0.045487 0.075781 0.044627 0.067544 0.051965 0.011382 0.018712 0.078668 0.024781 ... 0.024270 0.007171 0.022858 0.040107 0.099365 0.056661 0.027706 0.033087 0.029374 0.050560
5 0.009180 0.056345 0.122468 0.015349 0.055747 0.050703 0.041465 0.022168 0.008406 0.066443 ... 0.089200 0.072786 0.069762 0.064242 0.082554 0.021690 0.012690 0.044537 0.047515 0.042655
6 0.023618 0.045454 0.086994 0.032801 0.070096 0.040060 0.090586 0.029268 0.006227 0.041859 ... 0.070179 0.060486 0.050735 0.094458 0.052998 0.013305 0.031416 0.117383 0.029904 0.049392
7 0.014857 0.061572 0.082389 0.022662 0.064479 0.044807 0.051715 0.024757 0.007730 0.068231 ... 0.092871 0.070698 0.057779 0.107351 0.093670 0.017195 0.022955 0.097138 0.020293 0.098813
8 0.011679 0.056523 0.118667 0.019096 0.048913 0.058091 0.037737 0.032599 0.023495 0.061999 ... 0.036423 0.076230 0.073997 0.080319 0.050413 0.048640 0.033584 0.064320 0.070968 0.157156
9 0.047907 0.017687 0.073092 0.038385 0.017961 0.036246 0.087389 0.065425 0.012710 0.042743 ... 0.017636 0.046933 0.075315 0.085939 0.042873 0.045241 0.021665 0.105014 0.039806 0.071544
10 0.040134 0.082027 0.079348 0.005069 0.053638 0.025807 0.039630 0.051860 0.005556 0.093104 ... 0.120638 0.048492 0.095749 0.103957 0.033923 0.022031 0.079892 0.087979 0.059068 0.098300
11 0.013589 0.036012 0.130612 0.013350 0.094142 0.060279 0.042111 0.039546 0.011836 0.067965 ... 0.087644 0.075952 0.039526 0.082399 0.053348 0.049308 0.060390 0.035273 0.045369 0.041221
12 0.013278 0.076693 0.094863 0.069896 0.038686 0.041020 0.044609 0.042061 0.060944 0.077607 ... 0.100332 0.059323 0.046667 0.063437 0.078413 0.047825 0.040537 0.047600 0.050504 0.051961
13 0.016970 0.045127 0.131610 0.012027 0.058145 0.061185 0.027578 0.034400 0.014532 0.080855 ... 0.071732 0.129043 0.074302 0.067232 0.063565 0.024274 0.020252 0.042442 0.070562 0.050073
14 0.057795 0.202805 0.044659 0.027778 0.054118 0.039019 0.012211 0.039816 0.008413 0.124249 ... 0.120274 0.081481 0.083067 0.068621 0.039243 0.023893 0.070850 0.105038 0.042433 0.077379
15 0.006326 0.034968 0.054056 0.008170 0.051945 0.037683 0.101663 0.062767 0.012171 0.060984 ... 0.036226 0.084711 0.052153 0.059879 0.045709 0.045234 0.038582 0.057923 0.024984 0.059419
16 0.075042 0.105603 0.071173 0.022818 0.047048 0.024913 0.022176 0.042727 0.037902 0.058688 ... 0.130746 0.048090 0.064102 0.127262 0.042964 0.011893 0.047972 0.162462 0.027696 0.155392
17 0.000000 0.010426 0.177451 0.012087 0.042909 0.061503 0.025058 0.043557 0.041360 0.073495 ... 0.065702 0.056698 0.009290 0.103241 0.078331 0.034366 0.020100 0.031546 0.029936 0.068592
18 0.009209 0.049593 0.060717 0.051509 0.042109 0.056095 0.044524 0.061632 0.060945 0.095079 ... 0.030318 0.056512 0.027982 0.039313 0.044991 0.044408 0.021152 0.045364 0.029052 0.078699
19 0.002826 0.083090 0.083978 0.009371 0.051494 0.033789 0.061346 0.026867 0.051891 0.059732 ... 0.081031 0.046338 0.065250 0.057010 0.088411 0.023332 0.041245 0.050811 0.029943 0.052837
20 0.030186 0.097823 0.054781 0.043111 0.072452 0.041936 0.036108 0.026190 0.010674 0.078685 ... 0.047449 0.063439 0.097768 0.077138 0.039868 0.020710 0.023983 0.009042 0.041827 0.064521
21 0.012326 0.046549 0.150108 0.007219 0.067774 0.048741 0.020706 0.036746 0.012549 0.075017 ... 0.080359 0.090359 0.054180 0.073020 0.055195 0.030235 0.029159 0.037361 0.030508 0.039360
22 0.008484 0.047645 0.076901 0.017225 0.052598 0.049055 0.038167 0.029026 0.008698 0.059333 ... 0.088284 0.071765 0.045336 0.110738 0.062967 0.014659 0.027367 0.082127 0.011340 0.079721
23 0.037980 0.070222 0.097931 0.063178 0.068964 0.058489 0.027724 0.032391 0.041816 0.044047 ... 0.102094 0.042670 0.044916 0.067275 0.094652 0.028252 0.033078 0.093699 0.034202 0.026962
24 0.016891 0.088496 0.061640 0.011956 0.053816 0.086367 0.038484 0.038221 0.010931 0.100265 ... 0.028443 0.116413 0.108357 0.038803 0.028921 0.014889 0.050840 0.077369 0.023556 0.025721
25 0.025844 0.053226 0.090853 0.043200 0.075092 0.048094 0.080847 0.031406 0.024077 0.092045 ... 0.087482 0.070019 0.088319 0.091620 0.078335 0.017262 0.036700 0.098766 0.025113 0.042088
26 0.027433 0.053432 0.100859 0.043200 0.043688 0.057296 0.023425 0.019563 0.031812 0.020971 ... 0.039166 0.023661 0.020589 0.067996 0.037228 0.038071 0.019070 0.101733 0.047921 0.058474
27 0.082023 0.057357 0.092264 0.042386 0.033025 0.068182 0.058483 0.041052 0.012942 0.069580 ... 0.086463 0.052447 0.023506 0.080684 0.063771 0.027843 0.013668 0.101474 0.048364 0.062642
28 0.025576 0.099784 0.088543 0.075603 0.045919 0.047143 0.033007 0.021911 0.072259 0.053724 ... 0.096984 0.039739 0.055774 0.065759 0.078818 0.028928 0.043831 0.051812 0.031017 0.031801
29 0.019374 0.043660 0.059435 0.019326 0.036148 0.032975 0.089522 0.082891 0.014726 0.035597 ... 0.026807 0.045378 0.076961 0.084510 0.014635 0.054951 0.037131 0.062087 0.025507 0.056522
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5033 0.069258 0.066406 0.064676 0.065130 0.069851 0.042466 0.042384 0.073795 0.057951 0.073403 ... 0.103573 0.017623 0.045139 0.155690 0.089644 0.028910 0.038313 0.083473 0.011870 0.088079
5034 0.017841 0.046154 0.049344 0.103726 0.033553 0.041186 0.066124 0.107628 0.057930 0.065298 ... 0.066266 0.040025 0.064533 0.065890 0.072393 0.074924 0.038376 0.080679 0.025623 0.058393
5035 0.034983 0.033725 0.025041 0.083443 0.045847 0.043794 0.024244 0.049117 0.052597 0.035946 ... 0.034166 0.014157 0.084814 0.041455 0.063689 0.077281 0.022533 0.220163 0.013991 0.038263
5036 0.008204 0.051476 0.046322 0.071735 0.025193 0.036535 0.031345 0.079805 0.067165 0.062823 ... 0.071338 0.013240 0.079150 0.087603 0.033707 0.063579 0.031172 0.174233 0.023339 0.056870
5037 0.029345 0.049605 0.044511 0.071995 0.031499 0.047150 0.029887 0.086709 0.074062 0.041849 ... 0.073029 0.026284 0.032913 0.090364 0.036599 0.070467 0.025696 0.165910 0.011168 0.053785
5038 0.030109 0.081026 0.040171 0.137879 0.021160 0.043428 0.055324 0.090430 0.079881 0.046318 ... 0.038807 0.014852 0.039762 0.106410 0.029291 0.085213 0.042557 0.067354 0.030712 0.131160
5039 0.030937 0.048205 0.049411 0.105316 0.025428 0.025376 0.045115 0.077877 0.061180 0.066954 ... 0.044652 0.022537 0.104536 0.066528 0.039529 0.068218 0.037903 0.178465 0.011121 0.069935
5040 0.030655 0.032366 0.061563 0.123675 0.051264 0.012891 0.036419 0.039971 0.065684 0.056742 ... 0.045144 0.007596 0.094985 0.066315 0.047509 0.064236 0.024209 0.196815 0.021513 0.050018
5041 0.057358 0.040392 0.046493 0.102083 0.048882 0.044490 0.014242 0.082598 0.065664 0.098445 ... 0.023089 0.043101 0.058406 0.112691 0.015209 0.085615 0.018075 0.159296 0.003101 0.102476
5042 0.046114 0.056570 0.078639 0.059628 0.082358 0.051407 0.048675 0.049287 0.069359 0.066939 ... 0.107898 0.044618 0.050647 0.138664 0.089654 0.026652 0.065828 0.075167 0.025128 0.070158
5043 0.014858 0.036234 0.053320 0.081475 0.034916 0.037280 0.033755 0.086347 0.061071 0.052802 ... 0.047322 0.018498 0.095116 0.079417 0.029967 0.071276 0.027015 0.158880 0.011544 0.061599
5044 0.047769 0.018607 0.061311 0.199089 0.054627 0.048451 0.035488 0.063513 0.074824 0.134793 ... 0.011126 0.048340 0.024343 0.099708 0.048101 0.096552 0.062438 0.190639 0.012582 0.111926
5045 0.018171 0.044412 0.060260 0.091735 0.022055 0.046595 0.038483 0.099331 0.119372 0.057180 ... 0.070113 0.027150 0.023124 0.097168 0.035365 0.071890 0.028581 0.100454 0.008122 0.037203
5046 0.006111 0.064864 0.145881 0.034025 0.057225 0.103108 0.022331 0.076244 0.017688 0.120088 ... 0.103000 0.183992 0.097400 0.113637 0.032371 0.012295 0.041830 0.094915 0.000584 0.103440
5047 0.038380 0.094551 0.079936 0.019081 0.036741 0.051707 0.022397 0.032479 0.012749 0.064269 ... 0.111825 0.053127 0.017311 0.093873 0.070256 0.029326 0.030925 0.044722 0.017033 0.035041
5048 0.056846 0.062356 0.030797 0.110795 0.066364 0.067607 0.040369 0.069544 0.078017 0.044480 ... 0.009595 0.028833 0.053760 0.033196 0.030798 0.056244 0.043615 0.039491 0.046893 0.009490
5049 0.021163 0.061478 0.054005 0.066639 0.059412 0.023093 0.026058 0.094341 0.048172 0.075962 ... 0.019588 0.027928 0.049166 0.090979 0.026760 0.017534 0.077566 0.116003 0.008728 0.223768
5050 0.071494 0.037742 0.065402 0.077504 0.070301 0.063105 0.092179 0.069164 0.086799 0.096009 ... 0.050352 0.013583 0.184880 0.079655 0.031610 0.052077 0.041793 0.037079 0.048054 0.014539
5051 0.025705 0.147001 0.028410 0.087724 0.069197 0.090096 0.018450 0.024476 0.043880 0.063357 ... 0.122494 0.157826 0.026084 0.104547 0.077279 0.015865 0.034141 0.061950 0.021646 0.083769
5052 0.030671 0.050688 0.072678 0.070686 0.057578 0.034748 0.025960 0.030828 0.073492 0.051261 ... 0.060162 0.030322 0.043048 0.074061 0.038736 0.037335 0.040511 0.105118 0.017552 0.030946
5053 0.035973 0.077500 0.070373 0.061437 0.063724 0.034948 0.021556 0.044030 0.068548 0.062784 ... 0.038145 0.012310 0.029291 0.097589 0.033541 0.036488 0.044730 0.035994 0.029310 0.031589
5054 0.005166 0.050830 0.047770 0.179075 0.014049 0.006651 0.015482 0.061673 0.059912 0.128115 ... 0.015203 0.000000 0.025558 0.032710 0.033019 0.088505 0.046323 0.015469 0.005910 0.017931
5055 0.027437 0.099508 0.077453 0.000907 0.067092 0.130077 0.088450 0.074893 0.009080 0.082444 ... 0.043861 0.143186 0.043850 0.073110 0.034601 0.001809 0.050356 0.039323 0.033348 0.088424
5056 0.056407 0.133746 0.047994 0.115665 0.051492 0.076584 0.033125 0.051031 0.172310 0.035039 ... 0.038161 0.038963 0.052692 0.060100 0.038729 0.075714 0.057351 0.055267 0.031492 0.023432
5057 0.057751 0.069819 0.048474 0.009305 0.034402 0.092841 0.091284 0.049932 0.018692 0.070836 ... 0.036570 0.084541 0.018679 0.072888 0.095547 0.057751 0.026306 0.110610 0.042966 0.057995
5058 0.053958 0.064273 0.055438 0.009574 0.042110 0.087502 0.061285 0.046032 0.042663 0.054065 ... 0.061994 0.088464 0.034731 0.036467 0.108213 0.032121 0.035588 0.057016 0.032229 0.036974
5059 0.000419 0.138693 0.069809 0.068864 0.038240 0.058684 0.048065 0.026493 0.026489 0.089685 ... 0.039828 0.073082 0.120836 0.049364 0.042480 0.030539 0.021643 0.088092 0.008376 0.066456
5060 0.030570 0.091367 0.058614 0.164003 0.046816 0.064614 0.052051 0.039348 0.112728 0.073559 ... 0.007061 0.034318 0.039612 0.027174 0.014225 0.058526 0.059292 0.070048 0.013437 0.052927
5061 0.017336 0.174790 0.000000 0.145110 0.045158 0.045114 0.022854 0.016130 0.061173 0.029872 ... 0.023042 0.000000 0.073594 0.023865 0.030643 0.046525 0.039664 0.060015 0.034125 0.016545
5062 0.033893 0.028315 0.008374 0.102024 0.076824 0.083252 0.127166 0.067250 0.126780 0.048312 ... 0.013666 0.066033 0.032234 0.077458 0.019514 0.023064 0.063030 0.091603 0.032611 0.034119

5063 rows × 256 columns

Question 8: Why does the size of the feature representation changes?

Question 9: Why does the size of the feature representation is important for a image retrieval task?

Now, let's continue visualizing the top-15 most similar images:

In [21]:
dataset.vis_top(dfeats, q_idx, ap_flag=True)
AP=69.01
In [22]:
do_tsne(dfeats, labels, classes, sec='1c')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.030s...
[t-SNE] Computed neighbors for 5063 samples in 9.596s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.135077
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.966179
[t-SNE] KL divergence after 300 iterations: 2.426583
t-SNE done! Time elapsed: 26.802 seconds
2-dimensional t-sne plot.

Question 10: How does the aggregation layer changes the t-SNE visualization?

Question 11: Can we see some structure in the clusters of similarly labeled images?

d) ResNet18 architecture with GeM pooling

Now, we will replace the base architecture of our network (the backbone) with a ResNet18 architecture.

In [23]:
model_0 = resnet18()      # instantiate one model with average pooling and another 
model_1d = resnet18_GeM() # with GeM pooling with the same ResNet18 architecture

print(model_0.adpool)     # Show how the last layers of the two models are different
print(model_1d.adpool)
AvgPool2d(kernel_size=7, stride=1, padding=0)
GeneralizedMeanPooling(3, output_size=1)

Question 12: Why do we change the average pooling layer of the original Resnet18 architecture for a generalized mean pooling?

Question 13: What operation is the layer model_1d.adpool doing?

  • Hint: You can see the code of the generalized mean pooling in file modules.py
In [40]:
model_1d.adpool
Out[40]:
GeneralizedMeanPooling(3, output_size=1)

Now let's do the same as before and visualize the features and top-15 most similar images to our query:

Let's use a different image for testing this time:

In [57]:
q_idx = 411

Now, let's load Oxford features from ResNet18 model and visualize the top-15 results for the given query index

In [56]:
do_tsne(dfeats, labels, classes, sec='1d')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.033s...
[t-SNE] Computed neighbors for 5063 samples in 9.624s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.246002
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.283081
[t-SNE] KL divergence after 300 iterations: 2.268624
t-SNE done! Time elapsed: 29.080 seconds
2-dimensional t-sne plot.

Question 14: How does this model compare with model 1c, that was trained in the same dataset for the same task?

Question 15: How does is compare to the finetuned models of 1b?

e) PCA Whitening

Now we will investigate the effects of whitening our descriptors and queries. We will not be changing anything in the network.

In [27]:
# We use a PCA learnt on landmarks to whiten the output features of 'resnet18-cls-lm-gem'
dfeats = np.load(models_dict['resnet18-cls-lm-gem-pcaw']['dataset'])
qfeats = np.load(models_dict['resnet18-cls-lm-gem-pcaw']['queries'])
dataset.vis_top(dfeats, q_idx, q_feat=qfeats[q_idx], ap_flag=True)
AP=19.37

Visualize the data with t-SNE (excluding unlabeled images)

In [28]:
do_tsne(dfeats, labels, classes, sec='1e-1')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.031s...
[t-SNE] Computed neighbors for 5063 samples in 9.691s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.250935
[t-SNE] KL divergence after 250 iterations with early exaggeration: 85.496727
[t-SNE] KL divergence after 300 iterations: 2.907229
t-SNE done! Time elapsed: 25.638 seconds
2-dimensional t-sne plot.

Visualize the data with t-SNE (including unlabeled images)

In [29]:
do_tsne(dfeats, labels, classes, sec='1e-2', show_unlabeled=True)
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.033s...
[t-SNE] Computed neighbors for 5063 samples in 9.717s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.250952
[t-SNE] KL divergence after 250 iterations with early exaggeration: 85.511063
[t-SNE] KL divergence after 300 iterations: 2.915803
t-SNE done! Time elapsed: 20.310 seconds
2-dimensional t-sne plot.

Question 16: What can we say about the separation of data when included unlabeled images?

Question 17: And the distribution of the unlabeled features?

Question 18: How can we train a model to separate labeled from unlabeled data?

f) Finetuning on Landmarks for retrieval

Now we learn the architecture presented in item e) in an end-to-end manner for the retrieval task. The architecture includes a FC layer that replaces the PCA projection.

In [30]:
dataset.vis_triplets(nplots=5) 
# will print 5 examples of triplets (tuples with a query, a positive, and a negative)
Triplet for landmark balliol.
Triplet for landmark magdalen.
Triplet for landmark magdalen.
Triplet for landmark all_souls.
Triplet for landmark bodleian.

Now, let's visualize the top results as before:

In [31]:
# load Oxford features from ResNet18 model trained with triplet loss
dfeats = np.load(models_dict['resnet18-rnk-lm-gem']['dataset'])
qfeats = np.load(models_dict['resnet18-rnk-lm-gem']['queries'])
dataset.vis_top(dfeats, q_idx, q_feat=qfeats[q_idx], ap_flag=True)
AP=55.24

Visualize the data with t-SNE (excluding unlabeled images)

In [32]:
do_tsne(dfeats, labels, classes, sec='1f-1')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.031s...
[t-SNE] Computed neighbors for 5063 samples in 9.610s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.262229
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.931740
[t-SNE] KL divergence after 300 iterations: 2.449680
t-SNE done! Time elapsed: 22.910 seconds
2-dimensional t-sne plot.

Visualize the data with t-SNE (including unlabeled images)

In [33]:
do_tsne(dfeats, labels, classes, sec='1f-1', show_unlabeled=True)
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.034s...
[t-SNE] Computed neighbors for 5063 samples in 9.732s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.262246
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.928589
[t-SNE] KL divergence after 300 iterations: 2.448281
t-SNE done! Time elapsed: 26.741 seconds
2-dimensional t-sne plot.

Question 19: Compare the plots with unlabeled data of the model trained for retrieval (with triplet loss) and the model trained for classification of the previous subsection. How do they change?

g) Data augmentation and multi-resolution

Let's now check the effects of adding data augmentation techniques to the training. We will now compare models that have been trained with and without data augmentation.

We will load features that have been trained with the following data augmentation: cropping, pixel jittering, rotation, and tilting. This means that this model has been trained with the original image and its transformed versions. Please note that not all transformations might be useful for every class or image, but it is impossible to know in advance how the pictures were taken and the characteristics of each individual class a priori.

For example, cropping is useful when the landmark of interest is usually not found at the center of the image (e.g. selfies taken in front of the tour Eiffel).

Another standard practice besides data augmentation is to consider different variations of the same picture but at different resolutions. There are multiple ways to combine features extracted from those images, such as average pooling or spatial pyramids.

Using a model trained with data augmentation, we now extract features at 4 different resolutions and average the outputs.

Let's visualize the top results just like before:

In [34]:
dfeats = np.load(models_dict['resnet18-rnk-lm-gem-da-mr']['dataset'])
qfeats = np.load(models_dict['resnet18-rnk-lm-gem-da-mr']['queries'])
dataset.vis_top(dfeats, q_idx, q_feat=qfeats[q_idx], ap_flag=True)
AP=58.36

Visualize the data with t-SNE (excluding unlabeled images)

In [35]:
do_tsne(dfeats, labels, classes, sec='1g')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.038s...
[t-SNE] Computed neighbors for 5063 samples in 9.805s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.223889
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.493790
[t-SNE] KL divergence after 300 iterations: 2.362153
t-SNE done! Time elapsed: 27.299 seconds
2-dimensional t-sne plot.

Question 20: What is the difference in AP between a model that has trained with and without data augmentation?

Question 21: What about the clustering? Why do you believe some of the classes have not been adequately clustered yet?

Question 22: What other data augmentation or pooling techniques would you suggest to improve results? Why?

h) Improved architecture

Finally, we will now upgrade the backbone architecture to Resnet50.

In [36]:
dfeats = np.load(models_dict['resnet50-rnk-lm-gem-da-mr']['dataset'])
qfeats = np.load(models_dict['resnet50-rnk-lm-gem-da-mr']['queries'])
dataset.vis_top(dfeats, q_idx, q_feat=qfeats[q_idx], ap_flag=True)
AP=64.59

Visualize the data with t-SNE (excluding unlabeled images)

In [37]:
do_tsne(dfeats, labels, classes, sec='1h')
applying PCA...
applying t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5063 samples in 0.033s...
[t-SNE] Computed neighbors for 5063 samples in 9.702s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5063
[t-SNE] Computed conditional probabilities for sample 2000 / 5063
[t-SNE] Computed conditional probabilities for sample 3000 / 5063
[t-SNE] Computed conditional probabilities for sample 4000 / 5063
[t-SNE] Computed conditional probabilities for sample 5000 / 5063
[t-SNE] Computed conditional probabilities for sample 5063 / 5063
[t-SNE] Mean sigma: 0.246012
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.284798
[t-SNE] KL divergence after 300 iterations: 2.269293
t-SNE done! Time elapsed: 30.220 seconds
2-dimensional t-sne plot.

Question 24: Why using a larger architecture results in a higher AP? Is this always going to be the case?